intrinsic objective
Intrinsically-Motivated Humans and Agents in Open-World Exploration
Lidayan, Aly, Du, Yuqing, Kosoy, Eliza, Rufova, Maria, Abbeel, Pieter, Gopnik, Alison
What drives exploration? Understanding intrinsic motivation is a long-standing challenge in both cognitive science and artificial intelligence; numerous objectives have been proposed and used to train agents, yet there remains a gap between human and agent exploration. We directly compare adults, children, and AI agents in a complex open-ended environment, Crafter, and study how common intrinsic objectives: Entropy, Information Gain, and Empowerment, relate to their behavior. We find that only Entropy and Empowerment are consistently positively correlated with human exploration progress, indicating that these objectives may better inform intrinsic reward design for agents. Furthermore, across agents and humans we observe that Entropy initially increases rapidly, then plateaus, while Empowerment increases continuously, suggesting that state diversity may provide more signal in early exploration, while advanced exploration should prioritize control. Finally, we find preliminary evidence that private speech utterances, and particularly goal verbalizations, may aid exploration in children.
Constrained Intrinsic Motivation for Reinforcement Learning
Zheng, Xiang, Ma, Xingjun, Shen, Chao, Wang, Cong
This paper investigates two fundamental problems that arise when utilizing Intrinsic Motivation (IM) for reinforcement learning in Reward-Free Pre-Training (RFPT) tasks and Exploration with Intrinsic Motivation (EIM) tasks: 1) how to design an effective intrinsic objective in RFPT tasks, and 2) how to reduce the bias introduced by the intrinsic objective in EIM tasks. Existing IM methods suffer from static skills, limited state coverage, sample inefficiency in RFPT tasks, and suboptimality in EIM tasks. To tackle these problems, we propose \emph{Constrained Intrinsic Motivation (CIM)} for RFPT and EIM tasks, respectively: 1) CIM for RFPT maximizes the lower bound of the conditional state entropy subject to an alignment constraint on the state encoder network for efficient dynamic and diverse skill discovery and state coverage maximization; 2) CIM for EIM leverages constrained policy optimization to adaptively adjust the coefficient of the intrinsic objective to mitigate the distraction from the intrinsic objective. In various MuJoCo robotics environments, we empirically show that CIM for RFPT greatly surpasses fifteen IM methods for unsupervised skill discovery in terms of skill diversity, state coverage, and fine-tuning performance. Additionally, we showcase the effectiveness of CIM for EIM in redeeming intrinsic rewards when task rewards are exposed from the beginning. Our code is available at https://github.com/x-zheng16/CIM.
Evaluating Agents without Rewards
Matusch, Brendon, Ba, Jimmy, Hafner, Danijar
Reward Human Similarity solve challenging tasks in unknown environments. Objective Correlation Correlation However, manually crafting reward functions can be time consuming, expensive, and error prone to Task Reward 1.00 0.67 human error. Competing objectives have been Human Similarity 0.67 1.00 proposed for agents to learn without external Input Entropy 0.54 0.89 supervision, but it has been unclear how well they reflect task rewards or human behavior. To Information Gain 0.49 0.79 accelerate the development of intrinsic objectives, Empowerment 0.41 0.66 we retrospectively compute potential objectives on pre-collected datasets of agent behavior, rather Table 1: We computed Pearson correlation coefficients of than optimizing them online, and compare them each intrinsic objective with task reward and human similarity by analyzing their correlations. We study input across 3 Atari games and Minecraft from over 2 billion entropy, information gain, and empowerment time steps. The intrinsic objectives correlate more strongly across seven agents, three Atari games, and the 3D with human similarity than with task reward.